GM-TCNet: Gated Multi-scale Temporal Convolutional Network using Emotion Causality for Speech Emotion Recognition

نویسندگان

چکیده

In human-computer interaction, Speech Emotion Recognition (SER) plays an essential role in understanding the user's intent and improving interactive experience. While similar sentimental speeches own diverse speaker characteristics but share common antecedents consequences, challenge for SER is how to produce robust discriminative representations through causality between speech emotions. this paper, we propose a Gated Multi-scale Temporal Convolutional Network (GM-TCNet) construct novel emotional representation learning component with multi-scale receptive field. GM-TCNet deploys capture dynamics of emotion across time domain, constructed dilated causal convolution layer gating mechanism. Besides, it utilizes skip connection fusing high-level features from different gated blocks abundant subtle changes human speech. first uses single type feature, mel-frequency cepstral coefficients, as inputs then passes them temporal convolutional module generate features. Finally, are fed classifier accomplish task. The experimental results show that our model maintains highest performance most cases compared state-of-the-art techniques.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech Emotion Recognition Using Scalogram Based Deep Structure

Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...

متن کامل

Emotion recognition of conversational affective speech using temporal course modeling

In a natural conversation, a complete emotional expression is typically composed of a complex temporal course representing temporal phases of onset, apex, and offset. In this study, subemotional states are defined to model the temporal course of an emotional expression in natural conversation. Hidden Markov Models (HMMs) are adopted to characterize the subemotional states; each represents one t...

متن کامل

Modeling Perceivers Neural-Responses Using Lobe-Dependent Convolutional Neural Network to Improve Speech Emotion Recognition

Developing automatic emotion recognition by modeling expressive behaviors is becoming crucial in enabling the next generation design of human-machine interface. Also, with the availability of functional magnetic resonance imaging (fMRI), researchers have also conducted studies into quantitative understanding of vocal emotion perception mechanism. In this work, our aim is two folds: 1) investiga...

متن کامل

Emotion recognition using imperfect speech recognition

This paper investigates the use of speech-to-text methods for assigning an emotion class to a given speech utterance. Previous work shows that an emotion extracted from text can convey complementary evidence to the information extracted by classifiers based on spectral, or other non-linguistic features. As speech-to-text usually presents significantly more computational effort, in this study we...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Speech Communication

سال: 2022

ISSN: ['1872-7182', '0167-6393']

DOI: https://doi.org/10.1016/j.specom.2022.07.005